Duration: 90 mins
Level: Intermediate Pre-requisite Skills: Python
What this Use Case will teach you
At the end of this use case you will:
A brief introduction to CLUE data
The City of Melbourne conducts a bi-annual comprehensive survey of its residents and businesses called the "Census of Land Use and Employment (CLUE)". CLUE captures key information on land use, employment, and economic activity across the City of Melbourne.
CLUE datasets are a valuable tool for businesses looking to invest in the City of Melbourne and researchers wanting to understand those factors that influence and shape the dynamics of Australia's second largest metropolis and one of the world's most liveable cities.
CLUE data assists the City of Melbourne's business planning, policy development and strategic decision making. Investors, consultants, students, urban researchers, property analysts, businesses and developers can take advantage of CLUE to understand customers, the marketplace and the changing form and nature of the city.
Source: CLUE ( https://data.melbourne.vic.gov.au/stories/s/CLUE/rt3z-vy3t?src=hdr )
This use case makes extensive use of various CLUE datasets to illustrate the value to Data Scientists, Researchers and Software Developers.
CLUE Data is often coded to a specific location (Latitude and Longitude) and/or to a City precinct, referred to as the "CLUE small area". Datasets may also include the individual city block within a precinct referred to by its CLUE Block ID.
The geospatial coordinates describing these areas as polygons can be downloaded in GeoJSON format and used to show shaded areas on a map, known as a choropleth map. This can be a useful technique for illustrating broad trends or statistics for a city area rather than a specific location.
A map visualisation of CLUE Blocks and small areas can be found at the following links:
Which CLUE data should I use?
To begin our exploration and use of CLUE data we shall first import the necessary libraries to support our exploratory data analysis and visualisation.
Following are core packages required for this exercise.
The sodapy package is required specifically for accessing open data from SOCRATA compliant open data web sites.
The plotly.express package lets use build interact maps using map box services.
import os
import time
from datetime import datetime
import numpy as np
import pandas as pd
from sodapy import Socrata
import plotly.graph_objs as go
import plotly.express as px
To connect to the Melbourne Open Data Portal we must establish a connection using the sodapy library by specifying a domain, being the website domain where the data is hosted, and an application access token which can be requested from the City of Melbourne Open Data portal by registering here (https://data.melbourne.vic.gov.au/signup).
For this exercise we will login anonymously.
apptoken = os.environ.get("SODAPY_APPTOKEN") # Anonymous app token
domain = "data.melbourne.vic.gov.au"
client = Socrata(domain, apptoken) # Open Dataset connection
WARNING:root:Requests made without an app_token will be subject to strict throttling limits.
Next, we will look at one of the CLUE datasets to better understand its structure and how we can use it.
Our data requirements from this use case include the following:
For this exercise, we shall start by examining the Residential Dwelling dataset. Each dataset in the Melbourne Open Data Portal has a unique identifier which can be used to retrieve the dataset using the sodapy library.
The Residential Dwelling dataset unique identifier is rm92-h5tq. We will pass this identifier into the sodapy command below to retrieve this data.
This dataset is placed in a Pandas dataframe and we will inspect the first three rows.
# Retrieve the "CLUE Residential Dwellings 2020" dataset
data_rm92_h5tq = pd.DataFrame.from_dict(client.get_all('rm92-h5tq'))
print(f'The shape of dataset is {data_rm92_h5tq.shape}.')
print('Below are the first few rows of this dataset:')
# Transpose the DataFrame for easier visual comparison.
data_rm92_h5tq.head(3).T
The shape of dataset is (10402, 10). Below are the first few rows of this dataset:
| 0 | 1 | 2 | |
|---|---|---|---|
| census_year | 2020 | 2020 | 2020 |
| block_id | 1 | 1 | 11 |
| pbs_property_id | 611394 | 611395 | 103957 |
| bps_base_id | 611394 | 611395 | 103957 |
| street_name | 545-557 Flinders Street MELBOURNE VIC 3000 | 561-581 Flinders Street MELBOURNE VIC 3000 | 517-537 Flinders Lane MELBOURNE VIC 3000 |
| clue_small_area | Melbourne (CBD) | Melbourne (CBD) | Melbourne (CBD) |
| dwelling_type | Residential Apartments | Residential Apartments | Residential Apartments |
| dwelling_number | 196 | 189 | 26 |
| x_coordinate | 144.9565145 | 144.9559094 | 144.9566569 |
| y_coordinate | -37.82097941 | -37.82108687 | -37.81987147 |
We can see that there are 10,402 records and 10 fields describing each record.
We can see that each record show us the number of dwellings for each individual property and the type of dwelling e.g. House/Townhouse, Residential Apartments, etc.
The location of each property is given using:
The Census Year that the data was collected is also shown.
For our analysis of this dataset and others we will be restricting our analysis to the 2020 CLUE Census and summarising the data to CLUE Block level.
Summarising Residential Dwelling data
We want to plot the density of both residential dwellings and employment at city block level rather than specific property or address and so should visualise this data as a choropleth map.
Let's start by summarising the data at CLUE small area and Block level.
Note: We include CLUE Small Area as one of our group by fields so we can display the CLUE Small area name in the popup window when you hover over the area on the map.
We want to summarise the data by summing the number of dwellings across all rows in the same CLUE Block.
The following cell creates a dataframe containing this summary of residential dwellings.
# Cast datatypes to correct type so we can summarise
data_rm92_h5tq[['census_year', 'dwelling_number']] = data_rm92_h5tq[['census_year', 'dwelling_number']].astype(int)
data_rm92_h5tq[['x_coordinate', 'y_coordinate']] = data_rm92_h5tq[['x_coordinate', 'y_coordinate']].astype(float)
data_rm92_h5tq = data_rm92_h5tq.convert_dtypes() # convert remaining to string
data_rm92_h5tq.dtypes
# create the aggregate dataset
groupbyfields = ['block_id','clue_small_area']
aggregatebyfields = {'dwelling_number': ["sum"]}
dwellingsByBlock = pd.DataFrame(data_rm92_h5tq.groupby(groupbyfields, as_index=False).agg(aggregatebyfields))
# Dataframse Group by creates two levels of headings
# so we flatten the headings to make it easier to extract data for plotting
dwellingsByBlock.columns = dwellingsByBlock.columns.map(''.join) # flatten column header
dwellingsByBlock.rename(columns={'clue_small_area': 'clue_area'}, inplace=True) #rename to match GeoJSON extract
dwellingsByBlock.rename(columns={'dwelling_numbersum': 'dwelling_count'}, inplace=True)
dwellingsByBlock.head(5)
| block_id | clue_area | dwelling_count | |
|---|---|---|---|
| 0 | 1 | Melbourne (CBD) | 385 |
| 1 | 101 | West Melbourne (Residential) | 863 |
| 2 | 103 | Melbourne (CBD) | 638 |
| 3 | 104 | Melbourne (CBD) | 1093 |
| 4 | 105 | Melbourne (CBD) | 1729 |
Visualising Residential Dwelling on a Choropleth Map
We use the Plotly Python Open Source Graphing Library to generate maps from mapbox.
Creating a choropleth map requires us to know the geometry(shape) of each CLUE Block area as a collection of latitude and longitude points defining a polygon. This data can be downloaded from the Melbourne Open Data Portal in GeoJSON format.
We also need to supply the data to be used to highlight the CLUE Blocks and that data must include the same unique identifier for each Block contained in the GeoJSON data set.
Below we extract the Melbourne CLUE Block polygons into a JSON datatype.
The final line in the cell displays the unique key for each polygon which must also exist in the Residential Dwelling dataset.
from urllib.request import urlopen
import json
geoJSON_Id = 'aia8-ryiq' # Melbourne CLUE Block polygons in GeoJSON format
GeoJSONURL = 'https://'+domain+'/api/geospatial/'+geoJSON_Id+'?method=export&format=GeoJSON'
with urlopen(GeoJSONURL) as response:
block = json.load(response)
block["features"][0]['properties'].keys()
dict_keys(['block_id', 'clue_area'])
Now using just one function call called 'choropleth_mapbox' we can diaplay an interactive map visualisation using the block GeoJSON data to define the regions and the dwellingsByBlock dataframe to define the summarised data by block.
# Display the choropleth map
fig = px.choropleth_mapbox(dwellingsByBlock, # pass in the summarised dwellings per block
geojson=block, # pass in the GeoJSON data defining the CLUE Block polygons
locations='block_id', # define the unique identifier for the Blocks from the dataframe
color='dwelling_count', # change the colour of the block region according to the dwelling count
color_continuous_scale=["#FFFF88", "yellow", "orange", "orange",
"orange", "darkorange", "red", "darkred"], # define custom colour scale
range_color=(0, dwellingsByBlock['dwelling_count'].max()), # set the numeric range for the colour scale
featureidkey="properties.block_id", # define the Unique polygon identifier from the GeoJSON data
mapbox_style="stamen-toner", # set the visual style of the map
zoom=12.15, # set the zoom level
center = {"lat": -37.813, "lon": 144.945}, # set the map centre coordinates
opacity=0.5, # opacity of the choropleth polygons
hover_name='clue_area', # the title of the hover pop up box
hover_data={'block_id':True,'dwelling_count':True}, # defines which dataframe fields to display
# in the hover popup box
labels={'dwelling_count':'Number of Dwellings','block_id':'CLUE Block Id'}, # defines labels for
# the hover popup box
title='Residential Dwellings by CLUE Block Id for 2020', # Title for plot
width=950, height=800 # dimensions of plot in pixels
)
fig.show()
Congratulations!
You've successfully used Melbourne CLUE Open Data and Plotly to visualise residential density in the City of Melbourne!
Now zoom in and out on the map above to explore the city and areas of high and low residential density.
This is your first step to selecting a suitable location for your new business!
Visualising Residential Density and Cafe or Restaurant Seating
To build our view of cafe Venue Seating and how it relates to residential density we need to visualise both datasets on the same interactive map view.
We can do this by adding a new layer (or "trace" as it is called in Plotly) to our previous map of residential density.
Let's extract the Melbourne CLUE Cafe, restaurant, bistro seats dataset and summarise it so its ready to plot.
# Pull dataset from Melbourne Open Data Portal
data_dyqx_cfn5 = pd.DataFrame.from_dict(client.get_all('dyqx-cfn5')) # Melbourne CLUE Cafe, restaurant, bistro seats
# Cast columns to correct data type
integer_columns = ['census_year', 'block_id', 'property_id', 'base_property_id', 'industry_anzsic4_code', 'number_of_seats']
fp_columns = ['x_coordinate', 'y_coordinate']
data_dyqx_cfn5[integer_columns] = data_dyqx_cfn5[integer_columns].astype(int)
data_dyqx_cfn5[fp_columns] = data_dyqx_cfn5[fp_columns].astype(float)
data_dyqx_cfn5 = data_dyqx_cfn5.convert_dtypes() # convert remaining to string
# Summarise venue seating by location
groupbyfields = ['clue_small_area','block_id','y_coordinate','x_coordinate']
aggregatebyfields = {'number_of_seats': ["sum"]}
seatsByLocn = pd.DataFrame(data_dyqx_cfn5.groupby(groupbyfields, as_index=False).agg(aggregatebyfields))
seatsByLocn.columns = seatsByLocn.columns.map(''.join) # flatten column header
seatsByLocn.rename(columns={'clue_small_area': 'clue_area'}, inplace=True) #rename to match GeoJSON extract
seatsByLocn.rename(columns={'number_of_seatssum': 'number_of_seats'}, inplace=True) #rename to match GeoJSON extract
seatsByLocn['number_of_seats'] = seatsByLocn['number_of_seats'].astype(int)
# Calculate scale for drawing each bubble on scatter map plot
all_data_diffq = (seatsByLocn["number_of_seats"].max() - seatsByLocn["number_of_seats"].min()) / 16
seatsByLocn['scale'] = (seatsByLocn["number_of_seats"] - seatsByLocn["number_of_seats"].min()) / all_data_diffq + 1
seatsByLocn['scale'] = seatsByLocn['scale'].astype(int)+2
seatsByLocn.head(10)
| clue_area | block_id | y_coordinate | x_coordinate | number_of_seats | scale | |
|---|---|---|---|---|---|---|
| 0 | Carlton | 203 | -37.796707 | 144.965534 | 51 | 3 |
| 1 | Carlton | 203 | -37.796680 | 144.964900 | 42 | 3 |
| 2 | Carlton | 204 | -37.797833 | 144.965174 | 50 | 3 |
| 3 | Carlton | 204 | -37.797255 | 144.965754 | 120 | 3 |
| 4 | Carlton | 205 | -37.799470 | 144.964893 | 96 | 3 |
| 5 | Carlton | 205 | -37.799001 | 144.964765 | 80 | 3 |
| 6 | Carlton | 205 | -37.798721 | 144.965257 | 41 | 3 |
| 7 | Carlton | 206 | -37.800457 | 144.966558 | 51 | 3 |
| 8 | Carlton | 206 | -37.800191 | 144.966716 | 140 | 3 |
| 9 | Carlton | 206 | -37.800046 | 144.966741 | 115 | 3 |
Above we can see our summary dataframe has calculated the total number of seats (indoor and outdoor) at each unique locations (latitude and longitude).
Since there is such a wide variance in venue seating across the city we need to scale the size of the bubbles drawn on the map to just a few (16) distinct sizes.
We set the lowest scale to 3 to ensure even the smallest venue's bubble is large enough when one zooms in at block level.
The next step is to display both the Choropleth and Scatter maps. We first draw the choropleth map showing residential density. We then draw the scatter plot assigning it as a trace (aka "layer") to the existing figure then show both.
# Plot residential density and venue seating
fig = px.choropleth_mapbox(dwellingsByBlock, geojson=block, locations='block_id', color='dwelling_count',
color_continuous_scale=["#FFFF88", "yellow", "orange", "orange",
"orange", "darkorange", "red", "darkred"],
range_color=(0, dwellingsByBlock['dwelling_count'].max()),
featureidkey="properties.block_id",
mapbox_style="stamen-toner", #"carto-positron",
zoom=12.15,
center = {"lat": -37.813, "lon": 144.945},
opacity=0.5,
hover_name='clue_area',
hover_data={'block_id':True,'dwelling_count':True},
labels={'dwelling_count':'Number of Dwellings','block_id':'CLUE Block Id'},
title='Residential Dwellings Density & Venue Seating (2020)',
width=950, height=800
)
# Plot of venue seating
fig2 = px.scatter_mapbox(seatsByLocn, lat="y_coordinate", lon="x_coordinate", size="scale",
mapbox_style="stamen-toner",
zoom=12.15,
center = {"lat": -37.813, "lon": 144.945},
opacity=0.70,
hover_name="clue_area",
hover_data={"block_id":True,"scale":False,"number_of_seats":True,"x_coordinate":False,"y_coordinate":False},
color_discrete_sequence=['purple'],
labels={'number_of_seats':'Number of Seats', 'block_id':'CLUE Block Id'},
width=950, height=800)
fig.add_trace(fig2.data[0])
fig.update_geos(fitbounds="locations", visible=False)
fig.show()
Congratulations!
You've successfully used Melbourne CLUE Open Data and Plotly to visualise residential density and venue seating in the City of Melbourne in one map!
Now zoom in and out on the map above to explore the city and areas of high residential density but low venue seating.
This is could be a possible location for your new business!
Interactive Visualisation: Melbourne Business Locator Tool
This interactive visualisation uses CLUE data to let users explore opportunities for locating your hospitality business in the City of Melbourne.